The genome sequence of a drosophilid fruit fly, Drosophila histrio (Meigen, 1830)

We present a genome assembly from an individual female Drosophila histrio (the drosophilid fruit fly; Arthropoda; Insecta; Diptera; Drosophilidae). The genome sequence is 189.2 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 16.02 kilobases in length.


Background
Drosophila histrio Meigen,1830 is a large (4.5-6 mm) yellowbrown drosophilid 'fruit fly' (Figure 1A and 1B), distantly related to the laboratory model Drosophila melanogaster.The species is broadly distributed in wooded areas across the Palaearctic, from Portugal in the west to Japan and the Kuril Islands in the far east, and from central China in the south to the north of Norway (Bächli, 2023).It is one of around 30 British and Irish species of Drosophila (Chandler, 2021) and, like most of its close relatives, it is a specialist fungus breeder (Shorrocks, 1977).Adults tend to hug the forest floor and prefer decomposing and soft ephemeral fungal fruiting bodies, into which females lay a large number of small eggs (Kimura & Toda, 1989;Toda & Kimura, 1997).
Drosophila histrio appears notably less abundant than many other UK fungus-specialist drosophilid species (Shorrocks, 1977), but adults have been collected from a range of fungi, such as Boletus edulis (Morris, 2011), Polyporus squamosus (Chandler, 2021), and Hypholoma fasciculare (Figure 1C).Elsewhere, flies have also been reported from Pleurotus species (Kimura & Toda, 1989), and species of Lactarius, Collybia and Russela (Burla & Bächli, 1968).Although there are relatively few British records, D. histrio is not thought to be threatened; adults are regularly recorded in the south of the UK, with reports increasing from June to October (GBIF, 2023).
Here we present a chromosomally complete genome sequence for Drosophila histrio, derived from the DNA of two female offspring of a wild female collected from a sulphur tuft fungus (Hypholoma fasciculare) on the Penns in the Rocks estate, East Sussex, as part of the Darwin Tree of Life Project.This genome sequence is helping to resolve relationships among the Drosophilidae (Kim et al., 2023), and will further build on the value of this family as a model clade for comparative genomics and molecular evolution.This project is a collaborative effort to sequence all named eukaryotic species in the Atlantic Archipelago of Britain and Ireland.

Genome sequence report
The genome was sequenced from one female Drosophila histrio (Figure 1) reared at the Institute of Ecology and Evolution, University of Edinburgh, Scotland, UK (55.92,.A total of 45-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 161 missing joins or misjoins and removed 11 haplotypic duplications, reducing the assembly length by 0.44% and the scaffold number by 77.13%, and increasing the scaffold N50 by 15.39%. The final assembly has a total length of 189.2 Mb in 42 sequence scaffolds with a scaffold N50 of 36.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.66%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 2 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The X chromosome was identified by synteny with that of Drosophila phalerata idDroPhal2.1 (GCA_951394115.1)and has a dot chromosome fusion.The 0-15 Mb region of chromosome X is of unknown order and orientation.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
Drosophila histrio specimens were first-generation female progeny from a wild-collected female.The sequenced flies were reared at the University of Edinburgh (latitude 55.92, longitude -3.17), and were harvested on 2021-11-10.The mother was collected from a sulphur tuft fungus (Hypholoma fasciculare) at Penns in the Rocks Estate, East Sussex, England (latitude 51.093, longitude 0.170) on 2021-09-07.The fly was collected and identified by Darren Obbard (University of Edinburgh), and species identification was confirmed by examination of the progeny.Flies were reared on laboratory Drosophila medium with the addition of a ~2cm 3 piece of commercial mushroom (Agaricus bisporus) to encourage egg laying.Each living anaesthetised fly was placed directly into the collection tube and frozen from live at -80 °C.The sample with specimen ID SAN00002002 (ToLID idDroHist2 was used for DNA sequencing and the sample with specimen ID SAN00002003 (ToLID idDroHist3) was used for Hi-C scaffolding.
The workflow for high molecular weight (HMW) DNA extraction at the WSI includes a sequence of core procedures:      (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.
Table 3 contains a list of relevant software tool versions and sources.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.
Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

○
More details regarding library preparation would be helpful, for example referencing a kit or protocol.

○
Details in the analytical pipeline should be included, ideally as a supplement with annotated code.If default flags/options were used in the steps, this would be a simple way to add clarity, but if specific flags/options were used these should be noted.In the background, the authors provide a brief description of the organism and its geographical distribution.They also provide a rationale for sequencing the organism.If the space allows it, it would have helped the reader to know previous efforts in characterizing this organism and anything known prior, such as previous estimation of genome size, number of genes, among others.This is a minor comment as the background is ample in its current format. 1.
In the genome report, the authors could consider using genomescope to estimate heterozygosity (since they are using a wild strain) and genome size.It would be interesting to see if the final assembly would be within predicted size from Genomescope.This is a minor comment since their k-mer completeness already showed 100%.

2.
Overall, the authors provide a high quality genome and also assign much of it to chromosomes.Although much work remains to order the scaffolds and complete the gaps and also structurally and functionally annotate the genome, the current work will indeed be valuable to the whole community.I therefore recommend the publication of this genome so that this resource become widely accessible to the scientific community.
Is the rationale for creating the dataset(s) clearly described?Yes Reviewer Expertise: Genomics, bioinformatics, evolutionary biology

Figure 1 .
Figure 1.A: Male (above) and female (below) Drosophila histrio presented with a 5 mm scale bar.B: The four lab-reared siblings selected for sequencing.Sample SAMEA12110798 (centre right) was used for HiC sequencing, and sample SAMEA12110797 (centre left) was used for PacBio sequencing.C: The sulphur tuft fungus (Hypholoma fasciculare) from which the mother of the sequenced flies was collected (Penns in the Rocks Estate, East Sussex, England; 51.093N,0.1698E).

Figure 2 .
Figure 2. Genome assembly of Drosophila histrio, idDroHist2.2:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 189,265,883 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (65,316,020 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (62,479,567 and 58,608,496 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_ odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CATQJB01/dataset/CATQJB01/snail.

Figure 3 .
Figure 3. Genome assembly of Drosophila histrio, idDroHist2.2:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CATQJB01/dataset/CATQJB01/blob.

Figure 4 .
Figure 4. Genome assembly of Drosophila histrio, idDroHist2.2:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CATQJB01/dataset/CATQJB01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Drosophila histrio, idDroHist2.2:Hi-C contact map of the idDroHist2.2assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=J3MI3e8oQYSdTkL7Bwflrw.

Figure 3 ○Figure 4
Figure 3 could be improved by rotating the y axis labels 180 degrees.The legend is overlapping part of the upper panel and the color differences are not very effective because the lower left panel has differences in alpha level which correspond with the figure legend.Are all of the color-coded categories visible?It is not clear to me that anything is gained with the interactive version of the plot.○

○
Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?PartlyAre the datasets clearly presented in a useable and accessible format?YesCompeting Interests: No competing interests were disclosed.Reviewer Expertise: Quantitative and evolutionary genetics, transcriptomics, genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.Reviewer Report 15 July 2024 https://doi.org/10.21956/wellcomeopenres.22834.r86351© 2024 Bayega A. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Anthony Bayega McGill Genome Centre, Department of Human Genetics, McGill University, Montreal, Québec, Canada Obbard et al. provide an overview description of the genome of Drosophila histrio which they sequenced and assembled.I commend them for this.My comments follow: Are the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?Gen MorinagaUniversity of Calgary, Calgary, Alberta, CanadaIn the article titled "The genome sequence of a drosophilid fruit fly, Drosophila histrio (Meigen, 1830)", Obbard et al., combine PacBio HiFi and HiC sequencing data to assemble a chromosomescale genome of the fruit fly Drosophila histrio.Their protocol is adequate (if a bit brief), and the resulting assembly is highly complete and contiguous.If I had to criticize anything about the paper as it is, it would be that the bioinformatic protocol is very brief-I generally prefer getting more details from methods sections (e.g., flags used for each program).Aside from this (very) minor criticism, I believe the work presented is a valuable addition to the growing library of highquality invertebrate genomes and should prove useful to those studying Drosophila or Diptera biology.Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?PartlyAre the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.